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1.0  SUMMARY 


In  this  project,  a  synthetic  Deoxyribonucleic  Acid,  DNA-based  memory  called 
ComDMems  (Combinatorial  DNA  Memories)  was  developed.  This  research  demonstrates  that 
this  combinatorial  method  can  feasibly  yield  billions  of  covert  and  synthetic  DNA  memory 
strands  that  carry  object  and  process  information.  A  key  component  of  this  innovation  is  the 
combinatorial  method  of  bio-memory  design  and  detection  that  encodes  item  or  process 
information  as  numerical  sequences  represented  in  DNA.  This  DNA  data  structure  can  be  read 
by  the  wet  laboratory  method  polymerase  chain  reaction  (PCR)  and  then  algorithmically  decoded 
to  retrieve  virtually  an  unlimited  amount  of  item  or  process  information  that  has  been  stored  in 
the  combinatorial  memories. 

ComDMem  is  a  content  addressable  memory  (CAM)  as  opposed  to  a  standard  random 
access  memory  (RAM).  A  standard  RAM  goes  directly  to  a  physical  address  and  returns  the 
contents.  A  CAM  uses  the  content  of  the  input  to  direct  the  search  of  its  entire  memory  for  the 
specified  data  word. 

ComDMem  is  a  content  addressable  memory  (CAM)  as  opposed  to  a  standard  random 
access  memory  (RAM).  A  standard  RAM  goes  directly  to  a  physical  address  and  returns  the 
contents.  ComDMem  achieves  CAM  when  multiple  parallel  PCR  probes,  specific  for  certain 
pieces  of  information,  search  the  ComDMem  for  memories  that  contain  these  pieces  of 
information.  In  this  way  all  memories  associated  with  a  concept(s)  can  be  retrieved  and  decoded 
in  parallel. 

The  system,  SDViews  was  created  to  provide  visualization  if  oligo  interactions.  A  tiled 
display  system  was  built  to  increase  the  size  of  the  total  display  without  giving  up  fine  detail. 

This  Air  Force  Office  of  Scientific  Research  funded  project  was  performed  in  a 
collaboration  project  between  Air  Force  researchers  and  contractors  at  the  Air  Force  Research 
Laboratory  in  Rome,  New  York.  Air  Force  researchers  focused  on  the  Visualization  modeling 
and  3D  Views  work  described  in  Section  4.6.  This  report  describes  results  for  the  complete 
AFOSR  project. 
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2.0  INTRODUCTION 


In  [l]-[7]  it  has  been  shown  that  the  hybridization  that  occurs  between  a  DNA  strand  and 
its  Watson-Crick  complement  can  be  used  to  perform  mathematical  computation.  This  research 
addresses  how  the  massive  parallelism  of  DNA  hybridization  reactions  can  be  exploited  to 
construct  a  DNA  based  associative  memory. 

Single  strands  of  DNA  are  polymers  of  nucleotide  bases  adenine  (A),  cytosine 
(C),  guanine  (G)  and  thymine  (T)  and  thus  can  be  represented  by  sequences  of  the  letters  A,  C, 

G,  and  T.  DNA  sequences  have  an  orientation  that  reflects  the  asymmetric  covalent  linking 
between  consecutive  bases  in  the  DNA  strand  backbone;  e.g.,  5'AACG3'  is  distinct  from 
5'GCAA3',  but  it  is  identical  to  3'GCAA5'. 

DNA  can  be  single- stranded  (ssDNA)  or  double-stranded  (dsDNA).  ssDNA  most  easily 
forms  into  a  double-stranded  helix  with  its  oppositely  directed  reverse  complement.  To  obtain 
the  3 '^5'  reverse  complement  of  a  5 '^3'  strand  of  DNA,  substitute  A  with  T  and  C  with  G  and 
vice-versa.  For  example,  the  3'^5'  reverse  complement  of  5'TCGCA3'  is  3'AGCGT5'.  If  x  is  a 
DNA  sequence,  then  let  x  denote  it  reverse  complement  in  the  opposing  3 '^5'  direction.  For 
example  5'TCGCA3'  =  3'AGCGT5'.  Henceforth,  strands  without  strikethrough  are  5'^3'  and 
strands  with  strikethrough  are  3'— >^5'.  A  dsDNA  duplex  formed  between  a  strand  and  its  reverse 

TCGCA 

complement  is  called  a  Watson-Crick  (WC)  duplex,  e.g.,  .  Note  that  non-WC  duplexes 

TCGCA 

can  form  and  such  a  formation  is  called  a  cross-hybridization.  Cross-hybridizations  are 
undesirable  and  there  is  a  need  to  carefully  design  the  synthetic  DNA  to  ensure  that  a  cross¬ 
hybridization  never  happens.  The  length  of  ssDNA  or  a  dsDNA  WC  duplex  is  the  number  of 
bases  or  base  pairs  (bp)  respectively  in  the  strand.  For  example,  TCGCA  is  called  a  5-mer  (mer 

TCGCA 

is  short  for  polymer)  and  the  length  of  the  WC  duplex  is  5bp.  See  Figure  1. 

TCGCA 
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Coding  Strands  Probing  Complement  Strands 

for  Ligation  for  Reading 


TACGCGACTTTC 

ATCAAACGATGC 

TGTGTGCTCGTC 

ATTTTTGCGTTA, 

CACTAAATACAA 

GAAAAAGAAGAA, 

5’  3’ 


GAAAGTCGCGTA 

GCATCGTTTGAT 

GACGAGCACACA 

TAACGCAAAAAT 

TTGTATTTAGTG 

TTCTTCTTTTTC 

5’  3’ 


^  5TACGCGACTTTC3’ 

Must  Have  .SVIOOOOIOVWO.9 


ATCAAACGATGC 

1V0111OO1V30 


Watson  Crick 
(WC)  Duplexes 


Must  Avoid 


TACGCGACTTTC 

1V0111001VO0 


ATTTTTGCGTTA 

W0W0VWW0 


Cross  Hybridized 
(CH)  Duplexes 


Figure  1:  A  DNA  Code 


Hybridization  assays  offer  the  possibility  of  simultaneously  processing  trillions  of  bits  of 
information.  In  DNA  hybridization  assays  for  biomolecular  computing,  concatenated  DNA 
strands  can  be  used  for  multiple  purposes.  They  can  be  used  to  store,  write,  read  and  retrieve 
information.  Hybridization  assays  with  DNA  strands  are  also  used  to  separate,  manipulate, 
identify  and  address  molecules  in  many  other  important  experiments  beyond  biomolecular 
computing  [9]-[l  1].  Figure  2  shows  a  CAM  concept  using  DNA  for  the  probe  and  store. 
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Figure  2:  Scheme  for  Parallel  Search  for  Multiple  Associations  in  DNA 
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In  DNA  biomolecular  computing,  occasions  can  arise  where  a  sample  containing  several 
distinct  sequences  of  DNA  needs  to  be  analyzed.  For  example,  each  individual  sequence  in  a 
mixture  of  DNA  could: 

(i)  encode  a  solution  to  a  mathematical  problem  [12], 

(ii)  be  stored  information  associated  to  an  entity  [13]. 

(iii)  be  a  taggant  or  label  associated  to  a  target  [14]. 

In  these  cases,  the  composition  of  each  DNA  strand  in  mixture  needs  to  be  determined  so 
that  each  mathematical  solution,  memory  and/or  target  can  be  respectively  retrieved.  This 
research  shows  how  a  single  and  parallel  battery  of  reactions  performed  on  a  mixed  DNA  sample 
containing  an  arbitrary  subset  of  several  double  stranded  DNA  sequences  taken  can  be  used  to 
determine  the  composition  of  each  sequence  in  the  mixture. 

Further,  this  research  demonstrates  that  the  combinatorial  method  employed  can  feasibly 
yield  billions  of  covert  and  synthetic  DNA  memory  strands  that  carry  object  and  process 
information.  A  key  component  of  the  innovation  is  the  combinatorial  method  of  bio-memory 
design  and  detection  that  encodes  product,  item  or  process  information  as  a  numerical  sequence 
represented  in  DNA.  This  DNA  data  structure  can  be  read  by  the  wet  laboratory  method 
polymerase  chain  reaction  (PCR)  (that  can  also  be  converted  into  an  electrical  signal)  and  then 
algorithmically  decoded  to  retrieve  virtually  an  unlimited  amount  of  item  or  process  information 
that  has  been  stored  in  the  combinatorial  memories.  In  Figure  3,  data  is  encoded  using  DNA 
substrands  with  the  whole  library  strand  containing  related  associations,  i.e.,  "a  memory." 
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relational  table 


j  Primary  Occupation 


memory  strand 


Figure  3:  The  DNA  Associative  Array  Relational  Table 


The  system  SDViews  was  created  to  provide  visualization  if  oligo  interactions.  A  model 
was  created  to  represent  the  physicality  of  oligos,  their  structure,  movement  and  interactions.  A 
user  interface  was  created  to  graphically  display  the  model  results.  For  DNA  libraries  that  are 
large  enough  to  be  of  interest  the  model  graphics  output  produces  large  and  detailed  images. 
Current  computer  screens  don’t  have  sufficient  pixel  densities  to  display  the  number  of  details 
desired  from  a  simulation.  A  tiled  display  system  was  built  to  increase  the  size  of  the  total 
display  without  giving  up  fine  detail. 
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3.0  METHODS,  ASSUMPTIONS,  PROCEDURES 


3.1  Numerical  Sequences  Represented  in  DNA 

Throughout  the  remainder  of  this  report  all  lower  case  variables  are  natural  numbers,  e.g., 
n,  q,  s,  and  t. 

A  fixed  set  of  n •  q  relatively  short  t-mers  of  ssDNA  is  called  a  t-DNA  nxq  table  code 
and  is  denoted  by  DNA_TC(n,q,t).  See  Figure  4  for  an  example  of  a  DNA_TC(5,2,10)  where  n 
is  positions  along  the  long  strand,  q  is  the  number  of  rows  and  t  is  the  length  of  the  substrand. 
The  sequences  in  a  given  DNA_TC(n,q,t)  are  called  table-mers.  A  ssDNA  memory  library  is  the 
collection  of  q"  relatively  long  n  •  t  -mers  strands  of  ssDNA  that  are  concatenated  from  a  fixed 
DNA_TC(n,q,t).  A  member  of  a  ssDNA  memory  library  is  called  a  ssDNA  memory. 

Key  Idea:  Any  finite  numeric  sequence  can  be  encoded  as  a  ssDNA  (or  dsDNA)  memory  and 
vice-versa. 

For  example,  using  the  table-mers  from  Figure  4,  the  binary  sequence  01101  is  encoded 
as  CGTCCATCGT  CGCAAGCTGA  AGTGGATGCG  TCGGTAAGCG  TCGGAGTGCT.  This  encoding  is  possible 
because  only  certain  collections  (partitioned  by  font  type)  of  sequences  are  allowed  to  be  in  each 
position  (e.g.,  Arial  =  position  0,  Comic  =  position  1,  etc.  )  and  within  each  collection,  distinct 

strands  are  assigned  distinct  numerical  values  (e.g.,  CGTCCATCGT  =  0,  GCAGAAGCCA  =  1 

for  position  0).  It  is  straightforward  to  see  that  table-mers  can  be  used  to  make  a  table  that  in 
turn  can  be  concatenated  to  make  q“  distinct  longer  DNA  memories  encoding  each  numeric 
sequence  with  n  digit  positions  where  each  digit  can  range  from  0  to  q-1. 


position  0 

position  1 

position  2 

position  3 

position  4 

0 

CGTCCATCGT 

CATTC&C&&A 

ACAGTTGCCG 

TCGGTAAGCG 

GAGCGAACCA 

1 

GCAGAAGCCA 

C&CAA&CT&A 

AGTGGATGCG 

TGCACGAGAC 

TCGGAGTGCT 

Figure  4:  A  DNA_TC(5,2,10) 
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Each  table-mer  in  DNA_TC(5,2,10)  in  Figure  4  can  be  labeled  by  an  ordered  pair 
(position,  value).  The  first  coordinate  corresponds  to  the  position  and  the  second  coordinate 
corresponds  to  the  value.  Font  type  only  indicates  position.  For  example  (0,1)= 

GCAGAAGCCA,  while  (2,0)=  ACAGTTGCCG. 


For  every  ssDNA  memory  there  is  a  corresponding  dsDNA  memory  that  is  the  unique 
WC  duplex  that  contains  the  ssDNA  memory.  See  Figure  5. 

CGTCCATCGT  C&CAA&CV&A  AGTGGATGCG  TCGGTAAGCG  TCGGAGTGCT 
CGTCCATCGT  C6CAAGCY&A  AGTGGATGCG  TCGGT.V\GCG  TCGGAGTGCT 

Figure  5:  A  "longer  strand"  double-stranded  WC  ComDMem 

Henceforth,  a  ssDNA  memory  is  identified  with  the  unique  WC  dsDNA  memory  that 
contains  it  and  the  term  DNA  memory  henceforth  means  WC  dsDNA  memory.  For  a  given 
DNA_TC(n,q,t)  table  code  M,  let  MEM  _  LIB(n,  q,  n  •  t)  of  M  denote  the  collection  of  q"  possible 
double-stranded  n  •  t  bp  memories  that  can  be  formed  by  concatenation,  where  each  DNA 
memory  is  identified  by  its  top  5'^3'  strand.  For  example,  the  DNA  memory  in  Figure  3  is  a 
member  of  MEM  _  LIB(5, 2, 50)  of  Figure  4. 
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3.2  Polymerase  Chain  Reaction  Laboratory  Method 


Polymerase  chain  reaction  (PCR)  is  a  technique  widely  used  in  molecular  biology, 
forensic  science,  environmental  science,  and  many  other  areas  [15-16],  Briefly,  PCR  is  a  test 
tube  system  that  exponentially  replicates  a  substrand  of  a  DNA  memory  that  is  delimited  by  two 
sequence  specific  recognition  sites  (e.g.,  table-mers)  which  are  found  at  the  ends  of  the  substrand 
to  be  selectively  amplified.  By  incubating  a  DNA  memory  mixture  with  oligonucleotide 
recognition  site  PCR  primers  and  the  enzyme  DNA  polymerase,  the  presence  of  a  pair  of 
recognition  sites  on  a  common  substrand  of  a  DNA  memory  can  be  determined  by  whether  or 
not  a  PCR  amplification  occurs. 

Key  Idea:  This  PCR  amplification  information  can  be  mathematically  exploited  to  decode 
layered  memories. 

A  standard  method  for  detection  of  amplification  involves  an  electrical  separation  and 
detection  of  DNA  substrands  on  a  size  separation  media  called  a  gel.  There  are  other  more 
sensitive  and  faster  (e.g.,  real-time  PCR)  methods  that  automate  the  entire  PCR  protocol  and  can 
detect  amplification.  These  instruments  can  very  reliably  provide  the  information  needed  to 
conduct  the  mathematical  algorithms  in  a  cost  effective  manner. 

3.3  Memory  Design  and  Synthetic  DNA  Code  SynDCode  Software 

The  decoding  accuracy  of  DNA  memories  by  the  PCR  method  depends  upon  whether  or 
not  so-called  false  priming  sites  exist  in  the  memories.  The  priming  sites  for  this  method  are  the 
table-mers  used  to  construct  the  memories.  False  priming  site  sequences  can  arise  if  two  or  more 
of  the  table-mers  are  too  similar  or  if  the  memory  sequence  regions  that  overlap  the  junctions 
where  table-mers  are  concatenated  are  too  similar  to  the  original  table  sequences. 
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The  synthetic  DNA  code  software,  SynDCode  [17]  is  a  tool  developed  to  design  synthetic 
DNA  sequences  to  be  used  in  biologically  based  information  systems  (e.g.,  DNA  computing, 
DNA  memory,  DNA  nanodevices  and  DNA  memories).  SynDCode  allows  for  the  specification 
of  thermodynamic  distance  and  dissimilarity  so  that  the  synthetic  table-mers  (and  their 
complements)  do  not  create  false  priming  sites.  The  table-mers  in  Figure  4  were  designed  by 
SynDCode  to  be  non-complementary  and  non-cross-hybridizing  so  that  each  position  in  a 
memory  library  strand  will  be  (ultra)  specific  for  a  unique  PCR  primer.  The  fact  that  SynDCode 
gives  non-cross-hybridizing  output  has  been  experimentally  verified  repeatedly  in  the  laboratory. 
Enhanced  SynDCode  strand  design  optimization  methods  were  developed  in  [25-28]. 

3.4  The  PCR  Signal  and  PCR  Network  Graph 

As  a  small  example,  consider  the  table-mers  in  Figure  4  and  all  32  distinct  (one  for  each 
0,  1  string  of  length  5)  memories  in  MEM  _LIB(5,2,50)  formed  from  Figure  4.  For  the  general 

n-(n-l)-q^ 

memory  library  MEM_LIB(n,q,n-t),  there  are  2  primer  pairs  of  table-mers  and  thus  the 

n-(n-l)-q^ 

same  number  of  distinct  PCR  reactions  with  each  memory  being  positive  for  exactly  2  of 
them.  In  the  above  example,  n  =  5  and  q  =  2,  so  there  are  40  distinct  PCR  reactions  to  perform 
with  any  given  memory  being  positive  for  10  of  them.  Forty  may  seem  like  many  reactions,  but 
current  PCR  technology  allows  for  768  simultaneous  reactions  (e.g..  Applied  Biosystems  Auto- 
Lid  Dual  384-Well  GeneAmp®  PCR  System  9700). 

Figure  6(a)  below  is  a  graphical  interpretation,  called  a  PCR  network  graph,  of  all 
possible  PCR  reactions  from  primer  pairs  of  table-mers  from  Table  1.  The  lines  connecting  the 
nodes  in  the  graph  denote  all  possible  primer  pairs.  Notice  that  there  are  no  lines  between  primer 
pairs  with  the  same  first  coordinate  (e.g.,  (4,0)  and  (4,1)).  This  is  because  no  single  memory  can 
have  two  distinct  table-mers  at  the  same  position.  In  Figure  6(b),  the  set  of  bold  lines  denotes  the 
set  of  positive  PCR  reactions  for  the  DNA  memory  represented  by  01 101 . 
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(a) 


(b) 


Figure  6:  PCR  Network  Graph 


Key  Idea:  By  using  smaller  DNA  fragments  that  mathematically  constitute  what  is  known  as  a 
combinatorial  cover  (hence  the  name  ComDMem)  the  same  PCR  network  graph  information  can 
be  obtained  that  would  be  received  from  a  longer  DNA  memory. 

Let  M  be  a  fixed  collection  of  table-mers  DNA_TC(n,q,t).  An  s-DNA  cover  o/M  is  a 
collection  of  double-stranded  WC  duplexes  concatenations  of  s  table-mers  taken  from  M  that 
yield  exactly  all  the  same  positive  PCR  reactions  that  exist  for  the  entire  memory  library 
MEM  _LIB(n,q,n  t)  for  M.  A  DNA  sequence  in  an  s-DNA  cover  is  called  a  covering  strand. 
Note,  since  the  length  of  such  a  covering  strand  is  s  •  t  bp,  then  the  s-DNA  cover  of  M  is  called  a 
COV  DNA (n,  q,  s  •  t)  of  M.  A  COV  DNA (n,  q,  s  •  t)  of  M  is  also  referred  to  as  an  s-DNA  cover  of 
the  memory  library  MEM  _  LIB(n,  q,  n  •  t)  constructed  from  M. 


Key  Idea:  By  using  DNA  covers  of  DNA  memory  libraries,  a  virtual  memory  can  be  constructed, 
i.e.,  ComDMems,  that  behave  exactly  like  real  (and  longer)  memories  in  the  library  with  respect 
to  their  PCR  signal.  Thus,  for  MEM_LIB(n,  q,n-t) ,  instead  of  having  to  painstakingly  construct  cf 

memories,  one  can  construct  approximately  strands  in  COV_DNA{n,q,s  -t)  and  get  the  same 
results  by  algorithmic  mixing  to  make  the  ComDMems.  This  amounts  to  a  feasible  fold  cost 
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reduction.  For  example,  with  n  =  10,  q  =  2,  s  =  3,  the  reduction  is  approximately  100  fold. 
Moreover,  the  physical  construction  of  long  DNA  memory  sequences  when  n-t  is  greater  than 
200  is  virtually  impossible.  Thus,  to  get  massive  amounts  of  data  storage  capability, 
COV_DNA[n, q,s-t)  must  be  used. 

For  example,  consider  the  MEM  _LIB(5,2,50)  constructed  from  Table  1  and  let  C  be  a 
COV  DNA (5,2,30)  3-cover.  The  four  covering  strands  csi,  CS2,  CS3  and  CS4  in  C  that  appear 

below  in  Figure  7  together  constitute  a  virtual  ComDMem  memory  for  the  actual  memory  that 
appears  in  Figure  5  above. 


CGTCCATCGT  CGCAAGCVGA  AGTGGATGCG 

CSi  = 

CGTCCATCGT  CGCAAGCTGA  AGTGGATGCG 

CGTCCATCGT  CGCAAGCVGA  TCGGAGTGCT 

CS2—  CGTCCATCGT  CgCAAgCTgA  TCGGAGTGCT 

AGTGGATGCG  TCGGTAAGCG  TCGGAGTGCT 

CS3  AGTGGATGCG  TCGGTAAGCG  TCGGAGTGCT 

CGTCCATCGT  CGCAAGCVGA  TCGGTAAGCG 

CS4  = 

CGTCCATCGT  CGCAAGCVGA  TCGGT.\.\GCG 

Figure  7:  Covering  Strands  for  Memory  in  Figure  5 

The  virtual  aspect  of  the  collection  of  the  four  covering  strands  can  be  observed  in  Figure 
8.  Each  of  the  four  covering  strands  gives  rise  to  three  positive  PCR  reactions.  For  example,  CS3 
has  positive  PCR  reactions  for  the  primer  pairs  in  the  triangle  (3,0),  (4,1),  (2,1)  whose  lines  are 
shaded  with  the  shorter  dashes  (-  -  ).  The  triangle  of  edges  that  are  positive  for  each  covering 
strand  csi  are  shaded  according  to  the  line  type  associated  with  csi  in  Figure  7.  Note  that  the  line 
between  (0,0)  and  (1,1)  appears  in  three  of  the  four  triangles  and  is  thus  partially  highlighted  by 
three  different  shadings.  Comparing  Figure  8  to  Figure  6(b),  it  can  be  observed  that  csi,  cs2,  CS3 
and  CS4  in  total  give  the  same  ten  positive  PCR  reactions  as  does  the  single  longer  memory  that 
they  cover. 
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Key  Idea:  From  the  point  of  view  of  the  positive  PCR  reactions,  the  single  longer  memory  is 
indistinguishable  from  the  mixture  of  the  covering  strands,  i.e.,  the  virtual  memory  ComDMem. 


Figure  8:  PCR  Results  from  Covering  Strands  in  Figure  7 


(a)  (b) 

Figure  9:  PCR  Graphs  of  Solely  Positive  PCR  Reactions 


When  a  graphical  representation  of  PCR  reactions  is  given,  only  lines  that  denote  positive 
PCR  reactions  need  to  be  given.  Using  this  representation  Figure  6(b)  becomes  Figure  9(a). 
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Using  the  binary  representation  of  MEM  _  LIB(5, 2, 50) ,  Figure  9(b)  gives  the  positive 
PCR  reactions  for  the  group  1 1000,  00110,  11100  and  1 1 1 10  of  four  layered  memories.  Note  that 
theoretically.  Figure  9(b)  would  be  the  same  if  either  the  four  actual  MEM_LIB(5,2,50) 
sequences  11000,  00110,  11100  and  11110  of  50bp,  or  the  sixteen  covering  strands  of  30bp  in 
COV_DNA(5,2,30)  that  covered  each  of  the  memories  11000,  00110,  11100  and  11110,  were 
combined. 

Key  Idea:  The  physical  manufacture  of  all  the  DNA  memories  in  a  MEM_LIB(n,q,n-t)  is  an 
extremely  costly,  low  yield  and  sometimes  impossible  endeavor  especially  for  large  n  and  t.  With 
this  combinatorial  innovation,  one  can  get  the  same  benefits  by  using  COV_DNA[n,q,s -t)  with  a 
feasible  fold  reduction  in  cost 

3.5  Oligo  Visualization  with  3DViews 

This  task  in  the  AFOSR  project  helped  bridge  the  gap  between  the  virtual  design 
and  expected  interaction  of  short  DNA  strands  and  the  physical  implementation  and  real 
interactions  in  a  physical  experiment.  A  physical  model  of  the  DNA  strand  and  strand  to 
strand  interactions  was  created.  A  graphical  user  interface  was  created  to  allow  designers 
to  visualize  the  complex  physical  structures  and  interactions  of  DNA  systems.  A  large 
scale  tiled  computer  display  system  was  built  to  provide  the  large  display  area  with  high 
pixel  resolution  needed  to  display  the  DNA  interactions.  A  tiled  display  was  created 
because  it  provides  a  larger  viewing  area  while  not  giving  up  resolution  in  the  number  of 
pixels  per  area.  In  this  manner,  a  5  x  5  monitor  tiled  array  has  25  times  as  many  pixels 
and  25  times  the  image  resolution  as  a  monitor  that  is  25  times  bigger  in  area  than  the 
original  monitor  but  has  the  same  number  of  total  pixels.  Tiled  monitor  technology  is 
becoming  important  of  object  visualization  for  many  complex  informatics  applications. 
The  completed  hardware  /  software  /  interaction  model  system  was  called  SDViews  and 
is  described  in  the  results  section. 
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4.0  RESULTS,  DISCUSSION 


4. 1  The  Mathematical  Model 

For  2  <  n,q  let  Vjj  q  be  the  set  of  all  ordered  pairs  (p,  Vp)  where  p  e  [n]  and  Vp  e  [q] . 
An  n-set  in  Vjj  q ,  {(p,  Vp)}pe[n]  ’  where  the  first  coordinates  are  distinct,  can  be  uniquely 

identified  with  an  element  of  [q]“  and  vice-versa.  Under  this  bijection,  each  x  e  [q]“  , 

T  =  Vo...Vjj_j,  is  identified  with  the  n-set  of  ordered  pairs  in  Vjj  q,  x  =  {(0,Vo),...,(n-l,Vjj_i)} , 

where  the  first  coordinate  designates  the  position  in  the  sequence  and  the  second  coordinate 
represents  the  value  at  that  position.  For  example  {(2,0),  (0,1),  (1,3)}  corresponds  to  130. 
Henceforth,  [q]"  to  denotes  n-sets  in  Vjj  q  where  the  first  coordinates  are  distinct. 

Ejjq  denotes  the  set  of  all  pairs  {(pi,Vi)»(P2A2)}  Vjj  q  where  pj  ^^2-  Then  Ej^q 

is  the  set  of  all  edges  in  the  q-partite  graph  Gji^q  on  the  vertex  set  Vji  q  where  the  independent 

sets  are  collections  of  vertices  with  the  same  first  coordinate.  Further  identify 
X  =  {(0,VQ),...,(n-l,Vn_i)}  with  the  complete  subgraph,  denoted  K.^,  of  0^  q  on  the  vertices  in 

xe[qr. 


The  correspondence  between  the  mathematical  and  physical  entities  is  as  follows:  Vn,q  is 
identified  with  S,  MEM  _  LIB(n,  q,  n  •  t)  is  identified  with  [q]"  and  En,q  is  identified  with  all 
possible  PCR  reactions.  This  latter  identification  is  less  obvious  than  the  others.  A  pair  of 
primers  Vp  ,  Vp^  where  0<pj<P2<n-l  corresponds  to  a  unique  PCR  reaction.  Then 

identifying  { Vp  ,  Vp^ }  with  { Vp^  },  the  identification  of  En,q  and  PCR  reactions  is  observed. 
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Using  these  identifications,  given  a  pool  of  sequences,  U  =  ,  from  [q]*^ , 

consider  an  edge  e  =  {(pj,Vi),(p2,V2)}  in  Ejj^q.  Say  that  e /j' /or  17  if  and  only  if  there 

is  a  T  e  U  such  that  x  has  value  vi  in  position  pi  and  value  V2  in  position  p2.  Considering  U  as  a 
pool  of  dsDNA  strands  taken  from  MEM  _  LIB(n,  q,  n  •  t)  and  considering  e  as  the  PCR  reaction 

for  primers  Vp  ,  Vp^ ,  then  e  is  positive  for  pool  U  if  and  only  if  an  exponential  amplification 

results  from  exposing  the  sample  U  to  the  PCR  reaction  Vp  ,  Vp^ .  Experimentally,  this 

exponential  amplification  can  be  observed  in  many  ways.  Some  of  these  ways  are  described  as 
conventional  gel  based  and  SYBR  green  and/or  Taqman  based  real-time  PCR  [12],  [16], 

Finally  given  a  pool  of  sequences  from  [q]" ,  U  =  ,  let  Gu  denote  the 

subgraph  of  Gjj  qthat  consists  of  all  the  edges  positive  for  U.  Gy  is  the  graph-theoretic  union  of 
the  complete  subgraphs  .  If  U  is  considered  to  be  a  pool  of  dsDNA  strands  taken  from 

MEM  _  LIB(n,  q,  n  •  t) ,  then  Gy  is  identified  with  the  collection  of  all  positive  PCR  reactions 
taken  over  all  possible  pairs  of  primers  Ap^ .  In  either  the  mathematical  or  physical  setting, 

the  goal  is  to  identify  U  given  Gy .  The  interesting  applications  come  from  the  fact  that  Gy  can 
be  obtained  from  experimentation  without  the  direct  knowledge  of  the  contents  of  U. 

Consider  the  set  of  strands  S  given  in  Figure  4.  A  description  of  the  actual  physical 
construction  of  dsDNA  library  MEM  _LIB(5,2,50)  appears  in  [8],  Suppose  a  pool  U  taken  from 
MEM  _  LIB(5, 2, 50)  consists  of  the  duplexes  identified  by  11000,  00110,  11100  and  11110. 

Then  Gy  is  given  in  Figure  10,  the  graph  Gy  depicting  all  the  positive  PCR  reactions  from 
Figure  4. 


15 


(2.0) 


Figure  10:  PCR  graph  ^u^ith  U={11000,  00110,  11100,  11110}. 

A  closer  look  at  an  aspect  of  Figure  4  can  aid  the  discussion.  Consider  the  edge  {(1,1), 
(4,0)} .  From  Table  2,  this  edge  corresponds  to  the  PCR  reaction  primed  by  Sj  j 

=TCACACACACACACACAATT  and  the  complement  of  S4  q  =TCTCCTCTCCACTCAAAACC.  This 

PCR  reaction  yields  amplification  because  the  dsDNA  strands  1 1000  and  11100  are  members  of 
U  that  have  the  values  1  and  0  in  the  1  st  and  4th  positions  respectively.  (The  position  count  starts 
with  0.)  Note  that  the  PCR  reaction  primed  by  Sq  0  =CCAAACCTCCACTTTCCAAC  and  the 

complement  of  S2  0  =CCTTTCCTCCATCACCTCAT,  corresponding  to  edge  {(0,0),  (2,0)}  does  not 

yield  an  amplification  because  no  strand  in  U={1 1000,  00110,  11100,  11110}  has  the  value  0  in 
both  the  0th  and  2nd  positions. 
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4.2  The  Identification  Algorithms 


To  identify  U  from  Gy ,  two  approaches  are  taken.  The  methods  are  generalizations  of 

those  of  combinatorial  group  testing  [18-24],  The  first  is  called  the  disjunct  algorithm  that 
identifies  the  strands  in  MEM  _LlB(n,q,n  •  t)  that  are  surely  not  in  U.  The  second  is  called  edge 
representative  decoding  that  identifies  the  strands  in  MEM  _  LlB(n,  q,  n  •  t)  that  surely  are  in  U. 

Call  the  disjoint  sets  of  strands  identified  by  these  algorithms  the  resolved  positives  and 
resolved  negatives,  denoted  RP  and  RN  respectively.  From  the  definitions  of  these  sets,  then: 

RP(zU(zL„q(S)-RN.  (1) 

Hence,  if  RP  =  q  (S)  -  RN ,  then  U  =  RP  =  q  (S)  -  RN . 

The  disjunct  algorithm  is  simple  to  state: 

Disjunct  Algorithm:  Any  sequence  x  e  [q]" ,  thought  of  as  a  complete  subgraph  in  Gjj  q, 
that  has  an  edge  that  does  not  appear  in  Gu  is  a  member  of  RN. 

The  disjunct  algorithm  works  because  every  edge  of  every  x  e  U  corresponds  to  a 
positive  PCR  reaction. 

The  edge  representative  decoding  is  a  little  more  complicated. 
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Edge  Representative  Decoding:  Any  sequence  t  e  [q]" ,  thought  of  as  a  complete  subgraph 
in  G„  q ,  that  is  also  a  complete  subgraph  in  Gy  and  that  has  an  edge  that  is  not  contained  in  any 
other  complete  subgraph  in  Gy  is  a  member  of  RP.  In  other  words,  x  e  RP  if  and  only  if 
is  a  complete  subgraph  of  Gy  that  has  an  edge  that  is  not  contained  in  any  other  complete 
subgraph  K^.  in  Gy  with  K^.  K^. 

Edge  representative  decoding  works  because  every  edge  in  Gy  is  contained  in  a 
complete  subgraph  for  x  e  U .  Thus  if  an  edge  in  Gy  is  contained  in  a  unique  complete 
subgraph  of  Gy ,  then  that  subgraph  must  be  for  some  x  e  U . 

4.3  Algorithmic  Implementation 

In  this  section  the  graph  theoretic  algorithms  on  the  abstract  PCR  graph  that  implement  in 
the  PCR  data  decoding  software  are  given. 

Consider  information  storage  as  a  set  of  data  values,  S={sk}.  Each  data  value  Sk  is  a 

Sk  ={(iJ)}o<i<N 

ComDMem  and  each  ComDMem  is  a  set  of  ordered  pairs,  i.e.,  where  each 

ordered  pair  (i,j)  is  a  table-mer.  The  fundamental  question  now  becomes:  how  to  effectively 
implement  the  representative  decoding  so  that  S  can  be  found. 

The  algorithms  for  reconstructing  ^  are  graph-theoretic  in  nature,  so  one  must 

G®  =  E^t 

reformulate  the  problem.  Let  I  ’  ^  be  our  PCR  graph,  i.e.,  vertices  are  table-mers, 

edges  are  positive  PCR  reactions  and  a  ComDMem  is  a  clique  of  size  N. 
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Let  G'  =  (V‘,E*)  be  the  graph  with  the  following  properties: 

1)  Eaeh  vertex  veV  has  assoeiated  with  it  a  set  of  pairs  {(i,j)}  with  0<i<q  and 

0  <  j  <  N  . 

2)  If  (i,j)  and  (k,l)  are  in  the  set  assoeiated  with  a  vertex  veV*  ,  then  i^^k  . 

3)  If  the  vertices  of  an  edge  (v,w)eE*  have  associated  sets  of  pairs  p^  and  p^  ,  and 
(i,j),(k,l)ep^up^  ,then  i^^k  . 

This  G*  =  (V^E*)  graph  type  is  an  extension  of  the  PCR  graph  G*^  =  (V*^,E*^) .  Figure  1 1 
illustrates  the  extension.  Instead  of  each  node  having  just  one  pair  (i,  j)  ,  it  has  a  set  of  pairs 

.  To  make  viewing  easier,  each  set  of  pairs  is  represented  by  an  N  -tuple.  For  example, 
|(2, 0)|  is  represented  by  (*,  *,  O)  which  represent  an  unfolding  ComDMem.  Each  vertex 

V  e  is  equivalent  to  subset  of  the  pairs  from  a  data  entry  s^.  e  S ,  in  other  words  a  partial  or 

frizzy  ComDMem.  Each  edge  is  equivalent  to  positive  PCR  reactions.  One  can  construct  a 
sequence  of  graphs  satisfying  the  properties  above  using  Algorithm  1  given  in  Figure  12 


G”  G' 

Figure  11:  Gt-1  to  Gt  Graph  Extension  Scheme 
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Let.  p  [ii?]  be  the  sets  of  pairs  associated  with  node  w. 

/ /Each  edge  in  generates  a  node  in  C*' 
for  all  G  do 

Create  a  new  node  £  V*. 

p[ui']  Up[i''-‘I 

end  for 

/ /Small  cliques  in  gcneiates  edges  in 
for  all  do 

//jY]  C  are  the  neighbors  of 
Ni  ^  ^ 

for  all  such  that.  E  do 

iVi  ^  jVi  U 

end  for 

//-Ya  C  are  the  neighbors  of  tf'“C 
N2  ^  ^ 

for  all  u.4“^  E  V'^“^  such  that  do 

jV2U{ur^-‘} 

end  for 

//A'  C  are  the  neighbors  of  both  and 

N  ^  iVi  n  jVs 

ii?'  ^  node  in  created  from 

for  all  £  N  C  V"*“^  do 

G  form  a  3-chque. 

^  node  in  V*-  created  from  (u'“Luf'“^) 

u*  ^  node  in  created  from 

E*  ^  U  (irVtt'*) 

for  all  11'“^  G  N  -  C  do 

if  G  then 

£  1-'''“^  form  a.  4-cIique. 
£■'  €—  node  in  V'  created  from 
E^FU(T^T1^') 
end  if 
end  for 
end  for 
end  for 


Figure  12:  Gt-1  to  Gt  Graph  Extension  Scheme  Algorithm  1 


By  starting  the  sequence  with  the  original  PCR  graph  G®,  Algorithm  1,  this  is  a  means  of 
finding  all  ComDMem  cliques  in  G .  The  idea  is  that  each  edge  in  G^“^  generates  a  node  in 

G*  .  Two  nodes  have  an  edge  in  G*~^  if  its  constituent  edges  from  G*  form  a  clique  of  size  3 
or  4.  Figure  13  illustrates  the  application  of  Algorithm  1. 
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Figure  13:  How  Algorithm  1  Leads  to  ComDMem  Decoding 

Successive  application  of  Algorithm  1  would  eventually  lead  us  to  all  the  cliques  in  the 
original  graph  ,  but  at  great  computational  cost.  Instead,  by  applying  the  unique  edge 
representative  method,  one  can  take  advantage  of  the  fact  that  G®  was  constructed  as  the  graph- 
theoretic  union  of  cliques  of  size  N  e.g.,  ComDMems.  Examine  the  edge  between  (*,0,*)  and 

in  G®  from  Figure  13.  The  nodes  and  the  intersection  of  their  neighbors  form  exactly 
one  data  entry  (0, 0, 0)  .  Since  each  edge  comes  from  at  least  one  data  entry,  this  means  that  a 
ComDMem  has  been  found.  This  edge  searching  algorithm,  presented  in  Algorithm  2,  allows  us 
to  find  entries  in  S  early  in  the  sequence  of  graphs  G*^ ,  G\  •  •  •  .  Data  entries  found  this  way  in 
G®  must  be  in  the  original  data  set  S. 
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Let  p  [«/■]  be  the  sets  of  pairs  associated  with  node  w. 
^  0 

for  all  G  E*  do 

P  p  [w‘]  U  p  [t/] 

for  all  uf  G  V  ‘  such  that  («'.  iP) ,  (r*.  w*^)  G  E*  do 

P  ^  PU  p  [u.1*] 
end  for 

if  |P|  =  N  and  (ij) ,  (/.  m)  ^P  ^  i^l  then 
SI  -  U  P 

end  if 
end  for 


Figure  14:  Unique  Edge  Representative  Computational  Cost  Reduction  Algorithm  2 
4.4  The  Algorithms  Applied  to  Physical  Experimental  Data 

To  exhibit  the  above  algorithms,  actual  dsDNA  experiments  were  performed  on 
MEM  _  LIB(5, 2, 50)  for  Figure  4.  A  description  of  the  physical  construction  of  this 
MEM  _  LIB(5, 2, 50)  appears  in  [8] .  From  MEM  _  LIB(5, 2, 50)  ,the  four  sequences  that  were 
selected  and  taken  as  U  are  given  in  Figure  10.  To  actually  select  strands  from  this  library,  a 
cloning  method  was  used.  The  library  was  amplified  with  outside  primers,  the  amplified  product 
was  cut  with  BamHI  and  Hindlll,  the  expected  fragment  was  purified  and  then  ligated  into  the 
vector  pBluescript  [8].  Four  of  a  total  of  12  isolated  clones  were  selected  to  be  the  pooled 
sample  U.  Before  these  four  strands  (clones)  were  pooled,  the  individual  dsDNA  were 
sequenced  to  determine  which  library  members  were  actually  selected.  To  exhibit  the 
experimental  design  and  analysis,  an  incidence  matrix  is  useful  and  is  given  in  Figure  15.  In  the 
actual  experiments,  essentially  no  PCR  errors  occurred  and  the  empirical  outcomes  seen  in 
Figure  15  were  in  100%  agreement  with  the  theoretical  outcomes  that  can  be  founded  in  the  last 
two  rows  of  Table  15.  This  is  a  testament  to  the  SynDCode  design  method.  A  portion  of  gel 
output  of  the  actual  PCR  experiments  preformed  on  this  U  is  given  in  Figure  16. 
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Figure  15:  Record  of  Actual  PCR  Results 


The  sequences  in  MEM  _LIB(5,2,50)  are  given  vertically  as  labels  for  the  columns  of  the 
incidence  matrix  in  Table  3  and  are  numbered  1-32.  The  sequences  in  U  are  distinguished  by 
bold  faced  fonts  and  are  in  columns  7,  25,  29,  31.  The  PCR  reactions,  i.e.,  the  edges  in  G5  2, 

correspond  to  the  rows  and  are  numbered  1-40.  The  edge  labels  are  given  in  either  the  positive  or 
negative  PCR  columns  depending  upon  whether  the  given  edge  is  positive  or  negative  for  U. 
Every  entry  in  the  matrix  corresponds  to  a  pair  (PCR  reaction,  sequence).  There  is  a  1  in  a  given 
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entry  (i,j)  if  and  only  if  the  sequence  j  is  (theoretically)  positive  for  PCR  reaction  i.  Using  our 
mathematical  representation,  each  entry  corresponds  to  a  pair  (edge,  complete  subgraph)  and 
there  is  a  1  in  that  entry  if  the  given  edge  is  contained  in  given  complete  subgraph.  The  disjunct 
algorithm  uses  only  the  negative  PCR  reactions  which  are  listed  in  the  last  column  and  the  edge 
representative  decoding  algorithm  uses  only  the  positive  PCR  reactions  which  are  given  in  the 
penultimate  column.  In  the  actual  experiment,  whose  raw  results  can  be  seen  in  Figure  5,  the 
pooled  dsDNA  sample  is  separately  exposed  to  all  forty  pairs  of  PCR  primers. 

Using  Table  15  and  focusing  on  the  disjunct  algorithm,  sequences  9-16  are  in  RN  by 
virtue  of  PCR  reaction  2  because  each  of  the  sequences  9-16  contain  PCR  reaction  2  as  an  edge 
and  PCR  reaction  2  was  negative  for  the  given  U.  Thus  columns  9-16  are  labeled  m(2)  which  is 
meant  to  denote  that  these  sequences  are  in  RN  by  virtue  of  PCR  reaction  2  being  negative. 
Other  PCR  reactions  may  also  indicate  that  these  sequences  are  in  RN,  but  PCR  reaction  2  is  the 
first  in  our  ordering  to  do  so.  Similarly,  sequences  17-24  are  labeled  m(3),  sequences  1-4  are 
labeled  m(5),  sequences  5-6  are  labeled  m(9),  sequence  8  is  labeled  m(14),  sequence  26,  28,  30, 
32  are  labeled  m(16)  and  sequence  22  is  labeled  m(30).  Thus  RN={l-4,  5-6,  8,  9-16,  17-24,  22, 
26,  28,30,  32}. 

Using  Figure  15  and  focusing  on  the  edge  representative  decoding,  sequence  7  is  identified 
as  being  in  RP,  because  the  complete  graph  ,  x  =  001 10  is  the  column  7  label,  is  the  only 

complete  subgraph  of  Gy  that  contains  the  edge  {(0,0),  (1,0)}  which  denotes  the  positive  PCR 

reaction  1  .  Thus  column  7  is  labeled  RP(1)  which  is  meant  to  denote  that  this  sequence  is  in  RP  by 
virtue  of  PCR  1  being  positive.  Other  PCR  reactions  may  also  indicate  that  this  sequence  is  in  RP, 
but  PCR  reaction  1  is  the  first  in  our  ordering  to  do  so.  Similarly  sequences  25,  31  and  29  are 
respectively  identified  by  the  positive  PCR  reactions  7,  12,  and  31  and  columns  25,  31  and  29  are 
respectively  labeled  RP(7),  RP(12),  RP(3 1).  Since  RP  =  Lj  2(8)  -  RN ,  then 

U  =  RP  =  L5  2(S)-RN. 
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Figure  16:  A  Portion  of  the  Electrophoresis  Gel  from  the  PCR 


Figure  16  gives  a  portion  of  the  electrophoresis  gel  from  the  PCR  reactions  whose 
positive  and  negative  results  are  recorded  in  Figure  15.  The  lanes  where  bands  can  be  seen  are 
positive  PCR  reactions  for  the  encoded  primers  given  at  the  bottom  of  the  lane.  The  row  number 
of  Figure  15  that  corresponds  to  a  lane  appears  in  Figure  16  directly  below  the  encoded  primers 
for  the  given  lane.  In  all,  there  were  forty  separate  PCR  reactions  being  primed  by  all  forty 
primer  pairs,  each  with  the  same  dsDNA  sample  U.  Each  reaction  occurred  in  a  separate  well 
with  each  well  corresponding  to  distinct  a  lane  in  the  gel. 

4.5  The  General  Setting,  Parameters  and  Simulated  Performance 


In  general,  the  size  of  MEM  _  LIB(n,  q,  n  •  t)  is  q”  and  the  number  of  PCR  reactions  for 

2 

this  library  is  .  In  Figure  17,  the  outcome  of  simulated  performance  is  given  and 

compared. 


Memory  Size,  #  pairwise  associations  (q'^  -  6^) 

-10’’ 

Simultaneous  Records  Accessed  (picked) 

20 

Simultaneous  Associations  Accessed  @15 

300 

Accuracy 

97% 

SynDCode  Strands  Required  (nq) 

54 

Length  of  DNA  Memory  Strands 

180 

Number  of  DNA  Library  Strands 

~10^ 

Number  of  PCR  Reaction  Wells 

324 

Number  of  PCR  Reactions 

2916 

Computer  Clock  Cycles  Required 

0(1 0^) 

DNA  data  structure  requires  Standard  computer  requires 

0(n^qh  PCR  reactions,  0(nqh  we  s  0(n^q")  clock  cycles 

Figure  17:  Simulations  of  Algorithmic  Performance 
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4.6  Visualization  with  3DViews 


The  system  3D  Views  was  created  to  provide  visualization  if  oligo  interactions.  A  model 
was  created  to  represent  the  physicality  of  oligos,  their  structure,  movement  and  interactions.  A 
user  interface  was  created  to  graphically  display  the  model  results.  For  DNA  libraries  that  are 
large  enough  to  be  of  interest  the  model  graphics  output  produces  large  and  detailed  images. 
Current  computer  screens  don’t  have  sufficient  pixel  densities  to  display  the  number  of  details 
desired  from  a  simulation  and  the  objects  would  be  too  small  to  see  if  they  did.  A  tiled  display 
system  was  built  to  increase  the  size  of  the  total  display  without  giving  up  fine  detail. 

The  oligo  shape  was  modeled  as  an  elongated  ellipsoid  with  short  axis  a  and  long  axis  b. 
For  gross  movement  this  approximation  is  justified  by  the  rigidity  of  short  oligos  and  the  shape 
of  the  polar  charge.  Oligo  movement  was  modeled  by  a  Brownian  motion  3  dimensional  random 
walk.  The  one  dimensional  diffusion  coefficient  D  for  the  ellipsoid  shape  with  3  independent 
directions  is: 

^ln(2h)  -  In  (a) 

®  6nr]b 

Where  T  is  temperature,  ke  is  Boltzmann’s  constant,  and  r\  is  the  viscosity  of  the  medium.  The 
random  walk  motion  is  modeled  by  assuming  the  oligo  is  on  a  three  dimensional  lattice  and  may 
move  a  step  distance  dl  in  a  step  time  dt.  In  m  time  steps,  the  oligo  will  move  n  grid  points  with 
equal  probability.  In  random  walk,  the  Brownian  motion  is  approximated  by: 

(n  dlf 

D  =  7 - i- 

bmdt 

From  these  two  equations,  motion  of  a  group  of  oligos  was  mapped  through  space  by  the  motion 
model. 


Reactions  between  two  or  more  oligos  that  land  on  the  same  grid  point  were  modeled  by 
assuming  a  diffuse  solution  with  a  Boltzmann  distribution  in  the  probability  of  oligos  landing  on 
a  grid  point.  The  reaction  between  multiple  oligos  landing  on  the  same  point  was  modeled  by 
the  Boltzmann  distribution  for  interaction  states  where  the  probability  Pj  of  state  j  is: 
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-ACf 

Pi  =  - where 

Z=S;exp(^) 

T  is  the  temperature,  ke  is  Boltzmann’s  eonstant,  and  AGy  is  the  differenee  between  the  free 
energy  of  the  state  j.  The  oligo  model  and  design  tool  SynDCode  was  used  to  approximate  AG,- 

[17]. 

Rendering  the  model  output  was  a  significant  issue  due  to  the  large  number  objects  to  be 
displayed  on  the  computer  screen.  High  resolution  was  needed  to  view  individual  hybridization 
reactions.  To  view  the  various  kinetics  permutations,  a  large  number  of  grid  points  were  needed. 
A  tiled  display  system,  Mobile  Stream  Processing  Cluster,  MOSAIC  was  built  to  aid  in 
visualization  of  the  system.  The  finished  modeling  cluster  and  display  system  is  shown  in  Figure 
18.  A  set  of  nine  1920  x  1200  pixel  monitors  were  tiled  3  x  3  on  a  stand  which  also  holds  the 
computer  cluster  and  power  supplies.  The  result  was  a  continuous  5760  x  3600  pixel  display. 


Figure  18.  MOSAIC  Cluster 


27 


To  ran  the  visualization  model  and  drive  the  display,  three  8  core  Apple  Mac  Pros  were 
used  with  32GB  of  RAM  each.  Red  Hat  Enterprise  Linux  v.5x  was  used  for  the  operating 
system.  Each  Mac  Pro  was  given  three  ATI  Radeon  graphics  cards,  one  for  each  monitor  in  the 
tile  display.  The  computers  were  connected  with  10Gb  Ethernet. 

The  oligo  interaction  model  ran  on  the  cluster  creates  a  continuous  series  of  OpenGL 
calls  that  represents  the  graphical  output  of  the  model.  The  distributed  graphics  processing 
application  Chromium  was  used  to  render  the  graphical  output  across  the  nine  displays  in  real 
time.  The  result  was  a  high  fidelity  physical  model  of  the  diffusion  and  interaction 
thermodynamics  of  a  large  set  of  oligos  and  a  9x  improvement  in  resolution  in  display  of  the 
model  output. 


28 


5.0  CONCLUSIONS 


This  project  developed  a  synthetic  DNA-based  associative  memory  called  ComDMems 
that  unlike  conventional  silicon  based  associate  memories  provides  for  a  high  degree  of  input 
parallelization  that  allows  for  a  significant  reduction  in  required  data  structure  queries. 

This  innovation  combines  mathematics  and  molecular  biology.  First,  it  uses  mathematics 
to  design  the  synthetic  DNA  that  makes  the  storage  of  information  in  ComDMem  possible.  Then 
it  uses  the  specificity  of  DNA  strand  recognition  and  the  wet  laboratory  method  of  polymerase 
chain  reaction  (PCR)  to  store  information  and  to  generate  a  signal.  Finally,  it  uses  mathematics 
to  decode  the  PCR  signal  and  identify  the  ComDMem  signatures  and  reveals  the  information  and 
associations  they  contain. 

By  using  mathematical  combinations  of  short  "covering  strands"  in  place  of  each  single 
and  longer  memory  strand,  covert  ComDMems  can  encode  a  vast  amount  of  information  in  a 
more  efficient  way  and  that  this  encoded  information  can  be  retrieved  only  by  an  authorized  user. 
A  uniform  method  of  covering  strand  construction  that  minimizes  the  number  of  covering 
strands  and  theoretically  and  experimentally  mimics  the  behavior  of  the  longer  memories  strands 
was  given.  This  project  demonstrated  a  method  of  decoding  the  PCR  output  that  minimizes  the 
number  of  PCR  reactions  for  given  number  or  distribution  of  superimposed  or  associated 
ComDMems. 

These  synthetic  ComDMems  are  feasibly  functional  at  concentrations  that  are  below  the 
parts  per  billion  level.  Thus,  they  could  not  be  reverse  engineered  because  their  detection  would 
only  be  possible  with  prior  knowledge  of  the  memory  specific  DNA  sequences  required  for  PCR 
amplification.  Hence,  ComDMem  synthetic  DNA  memories  are  highly  covert.  ComDMems  can 
encode  item  or  process  information  as  a  numerical  sequence  in  DNA,  are  highly  covert,  are 
capable  of  carrying  virtually  an  unlimited  number  of  data  fields,  and  are  deeply  super  imposable 
and  thus  associative. 
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In  general,  the  decoding  of  associative  DNA  memory  has  been  an  intractable  problem  for 
processes  requiring  deeply  superimposed  memories.  However,  ComDMems  are  constructed  in  a 
sophisticated  combinatorial  manner  so  that  the  decoding  of  such  deeply  associative  memories  is 
feasible.  Thus,  beyond  being  covert  and  information-rich,  our  DNA  memories  can  enable  design 
of  efficient,  scalable  and  technically  useful  libraries  of  synthetic  DNA  for  use  in  high 
performance  associative  memory. 

The  MOSAIC  cluster  has  been  successfully  transitioned  for  use  in  three  projects  to  date. 

It  is  home  to  Distributed  Quantum  Computing  simulation  work  where  multi-thread  and  parallel 
processing  are  blended  to  reduce  latency  and  maximize  information  exchange  between  the 
systems.  It  is  also  the  main  demonstration  platform  for  the  SWATHBUCKLER  project  which 
requires  the  use  of  the  MOSAIC’S  nine  high-defmition  displays  to  view  wide  area  Synthetic 
Aperture  Radar  data.  Finally,  it  supports  an  Air  Force  Research  Laboratory  neuromorphic 
computing  camera  project  which  will  eventually  use  the  nine  tile  display  to  view  different 
algorithmic  approaches  of  computing  in  a  neuromorphic  design. 
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7.0  ACRONYMS 


CAM 

Content  Addressable  Memory 

DNA 

Deoxyribonucleic  Acid 

dsDNA 

double  stranded  DNA 

MOSAIC 

Mobile  Stream  Processing  Cluster 

PCR 

Polymerase  Chain  Reaction 

RAM 

Random  Access  Memory 

ssDNA 

single  stranded  DNA 

WC 

Watson  -  Crick 

A 

Adenine 

C 

Cytosine 

G 

Guanine 

T 

Thymine 
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